Model Selection

Real-time audio processing

# Real-time audio processing

Pyannote Segmentation

This is a speaker segmentation model based on powerset encoding, capable of processing 10-second audio clips and identifying multiple speakers and their overlapping situations.

Audio Processing

Speaker Diarization 2.5

A speaker diarization model modified based on pyannote/speaker-diarization-3.0, using speechbrain/spkrec-ecapa-voxceleb for speaker embedding, with better performance in certain tests

Speaker Analysis

Whisper Large V3 Turbo Russian

Russian automatic speech recognition (ASR) model optimized based on OpenAI Whisper Large V3 Turbo, fine-tuned using the Mozilla Common Voice 17 Russian dataset

Speech Recognition

Transformers Other

Voice Gender Classifier

A pre-trained model based on the ECAPA-TDNN architecture for classifying gender from human speech

Audio Classification

Pyannote Segmentation 30

This is an audio processing model for speaker diarization, capable of detecting speech activity, overlapping speech, and multiple speakers.

Audio Processing

Faster Whisper Large V3

Whisper large-v3 is a large-scale multilingual automatic speech recognition (ASR) model developed by OpenAI, supporting speech-to-text tasks in multiple languages.

Speech Recognition Supports Multiple Languages

Speaker Diarization 3.1

An audio processing model for speaker segmentation that can automatically detect and segment different speakers in audio.

Speaker Analysis

Segmentation 3.0

This is a powerset-encoded speaker diarization model capable of processing 10-second audio clips to identify multiple speakers and their overlapping speech.

Speaker Analysis

Faster Whisper Large V2

This is the CTranslate2 converted version of OpenAI Whisper large-v2 model for efficient speech recognition

Speech Recognition Supports Multiple Languages

Pyannote Speaker Diarization Endpoint

Speaker diarization model based on pyannote.audio 2.0 for automatic detection of speaker changes and speech activity in audio

Speaker Analysis

Wav2vec2 Large Xlsr 53 Spanish

A large-scale cross-lingual speech recognition model based on the Wav2Vec2 architecture, specifically optimized for Spanish, released by Facebook

Speech Recognition Spanish

Fasnettac Paper

An audio separation model trained based on the Asteroid framework, specifically designed for multi-channel audio signal separation tasks with noise

Sound Separation

Convtasnet Libri1Mix Enhsingle

ConvTasNet model trained on the Asteroid framework for single-channel speech enhancement tasks

Audio Enhancement

Quran Speech Recognizer

This model is a transfer learning-based Arabic speech recognition system specifically designed to identify Quran recitations and locate corresponding chapters.

Speech Recognition

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase